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frequencies varied by continent in a way that corresponds with observed population differences in average phe- 
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alee notypic intelligence. Average allele frequencies for intelligence GWAS hits exhibited higher inter-population var- 
GWAS iability than random SNPs matched to the GWAS hits or GWAS hits for height. This indicates stronger directional 


SNP polygenic selection for intelligence relative to height. Random sets of SNPs and Fst distances were employed to 
deal with the issue of autocorrelation due to population structure. GWAS hits were much stronger predictors 
of IQ than random SNPs. Regressing IQ on Fst distances did not significantly alter the results nonetheless it dem- 
onstrated that, whilst population structure due to genetic drift and migrations is indeed related to IQ differences 


between populations, the GWAS hit frequencies are independent predictors of aggregate IQ differences. 
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1. Introduction 


Over the last few years, population geneticists have moved away 
from the study of genetic evolution using the single-gene, Mendelian 
approach, towards models that examine many genes together (i.e. poly- 
genic models). The origin of this trend can be traced back to Fisher 
(1918). The more genes that are involved in a given phenotype, the 
more the signal of natural selection will be “dispersed” across different 
genomic regions (this is because each gene accounts for only a small 
proportion of the overall phenotypic variance) making individual 
genes difficult to detect using single gene-based approaches (i.e. 
Pritchard et al., 2010; Piffer, 2014). The first attempt at empirically iden- 
tifying polygenic selection was made by Turchin et al. (2012) utilizing 
height alleles (obtained from Genome Wide Association Studies, hence- 
forth GWAS) and two large populations (Northern and Southern 
Europeans). It was found that the Northern European population exhib- 
ited higher frequencies of height-increasing alleles (obtained from 
GWAS studies) — suggesting greater polygenic selection for height- 
enhancing alleles in this population relative to the Southern 
Europeans. The study had two principal drawbacks; i) the study relied 
on populations sourced from a single continent and, ii) crude country- 
level pairwise comparisons (e.g. French vs. Italian) were used without 
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correlating frequency differences to average population height. More- 
over, the strength of selection was not determined. 

Two different approaches to identify selection based on the correla- 
tion of allele frequencies across different populations have been recent- 
ly developed by Piffer (2013) and Berg and Coop (2014). 

Piffer's method utilizes factor analysis of trait-increasing alleles 
(found by GWAS) as a tool for quantifying the strength of selection on 
a phenotype and the underlying genetic variation (Piffer, 2013, see 
also: Minkov & Bond, 2015). An additional methodology consists of 
computing the correlations between genetic frequencies and the aver- 
age phenotypes of different populations; then, utilizing the method of 
correlated vectors (Jensen & Weng, 1994) to correlate the resulting 
vector of correlation coefficients with the corresponding vector of the 
alleles’ factor loadings. If the alleles are associated with signals of genetic 
selection, a positive correlation will result, as alleles exhibiting high fac- 
tor loadings can be expected to exhibit a stronger correlation with the 
phenotype of interest (Piffer, 2014). 

To date, GWA studies have identified a handful of alleles with 
replicated association with intelligence and proxy phenotypes, such as 
educational attainment. Rietveld et al.’s (2013) meta-analysis identified 
10 Single Nucleotide Polymorphisms (henceforth SNPs) that increased 
educational attainment, encompassing three with nominal genome- 
wide significance and seven with suggestive significance. A recent 
study (Ward et al., 2014) replicated the positive effect of these top 
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three SNPs (rs9320913, rs11584700 and rs4851266) on mathematics 
and reading performance in an independent sample of school children. 
These SNPs were also found to be associated with g (general intelli- 
gence) in a sub-sample of Rietveld et al.’s original study. 

Another SNP (1s236330), located within gene FNBP1L, was found to 
be significantly associated with general intelligence — as was reported 
in two separate studies (Benyamin et al., 2013; Davies et al., 2011). 
This gene exhibits high levels of expression in neurons, including hippo- 
campal neurons and developing brains, where it regulates neuronal 
morphology (Davies et al., 2011). 

Rietveld et al. (2014), utilizing the proxy-phenotype method, found 
three additional SNPs significantly associated with cognitive perfor- 
mance in a sample overlapping that used in their previous study 
(Rietveld et al., 2013). 

More recently, a GWAS focusing on fluid ability found 13 genetic 
variants exhibiting genome-wide significance (Davies et al., 2015), 
however, only three SNPs exhibited independent signals (i.e. were not 
in linkage disequilibrium [LD] with each other). 

A significant GWAS association with executive function and process- 
ing speed was reported for the SNP rs17518584 (p = 3.28 x 107° after 
adjustment for age, gender and education, N = 5429 — 32070). This ge- 
netic variant is located in an intron of the gene Cell ADhesion Molecule 2 
(CADM2) (Ibrahim-Verbaas et al., 2015). 

The aim of the present paper is to analyze the population allele fre- 
quency patterns of all of the SNPs found to date (July 2015) to have 
genome-wide significant associations with general intelligence or 
related cognitive phenotypes (executive functioning, processing 
speed, educational attainment) in order to test the hypothesis that 
they predict country-level differences in cognitive ability above and be- 
yond that predicted on the basis of neutral population genetic mecha- 
nisms (i.e. population structure due to migration and drift) which is a 
potential source of autocorrelation. The latter results from the non- 
independence of variables sampled in spatial or temporal proximity to 
one another (e.g. Mace & Pagal, 1994; for detailed criticisms of the con- 
cept as commonly, and naively applied in cross-cultural research see: 
Thornhill & Fincher, 2013). Autocorrelation is considered problematic 
because non-independence among supposedly spatially discrete 
variables violates the assumption of independently and identically dis- 
tributed errors, hence leads to false positive (Type I) errors. Demon- 
strating that the alleles predict country-level differences in cognitive 
ability above and beyond that predicted on the basis of migration, 
drift etc, can be taken to evidence the theory that these differences 
have been shaped by diffuse polygenic selection operating on these 
alleles. 


2. Methods 
2.1. Variables 


Allele frequencies for the SNPs were downloaded from the 1000 
Genomes database (The 1000 Genomes Project Consortium, 1000 
Genomes Project Consortium, 2012), using the final release of phase 
three data: http://browser.1000genomes.org/index.html 

A common factor was extracted from among the SNPs, utilizing un- 
weighted least squares factor analysis, yielding a metagene — this being 
a term utilized in genetics to describe patterns of covariance among 
genes. 

Two alternative metagenes were constructed, one utilizing only the 
four SNPs (1810457441, rs11584700, rs4851266, rs236330) exhibiting 
replicated GWAS associations with g and related phenotypes (hence- 
forth “four SNPs metagene”; Piffer, 2015a), and a second utilizing all 
nine alleles discovered to date (henceforth “nine SNP metagene”). In ad- 
dition to these metagene scores, a polygenic score was also computed 
by averaging across the allele frequencies. 

Country-level average IQs were obtained from Lynn and Vanhanen 
(2012). The Finnish and Vietnamese IQs were both adjusted upwards 


to 101 (from 97 in Lynn & Vanhanen, 2012), to account for recent, 
more accurate estimates of the Finish population (Armstrong, 
Woodley, & Lynn, 2014) and a recent reanalysis of the Vietnamese 
data (Rindermann, Hoang, & Baumeister, 2013). IQ for Tuscany was cal- 
culated as the average between the IQs estimated from PISA Creative 
Problem Solving (Piffer & Lynn, 2014) and from PISA Math, Science, 
Reading. There were three populations (Chinese Dai, Gujarati Indian 
and Indian Telegu) for which no estimates of average IQ were available. 
These populations were therefore excluded from the analyses. 


2.2. Autocorrelation control 


The method for controlling for possible autocorrelation stemming 
from population structure is based on taking the correlation between 
Fst distances for the entire genome (or a random part of it) and dis- 
tances (that is, the absolute number of the difference between any 
two populations) on the factor for all of the populations. Fst is a measure 
of population differentiation due to genetic structure, which is based on 
the partitioning of genetic diversity within and between populations. 
The vast majority of random SNPs all over the genome are believed 
not to be associated with specific phenotypes and therefore not to 
have been subject to selection. They rather represent population struc- 
ture resulting from random genetic drift, migration, admixture, and 
similar processes. 

Fst is calculated as the ratio of genetic variance in allele frequencies 
between populations to the sum of variance between gametes within 
individuals and within populations (Holsinger & Weir, 2009; Piffer & 
Dall'Olio, 2015; Weir & Cockerham, 1984). 

Two matrices representing genetic distances with N unique pair- 
wise comparisons are generated, where N = n*(n — 1)/2. Another 
matrix representing phenotypic distances (computed for average popu- 
lation IQ) is then created. 

Two steps are employed in testing the hypothesis that the factor 
does not merely represent population structure: 


1. The correlation between the two matrices representing genetic dis- 
tances is calculated. The lower the value, the more likely that the re- 
sult is due to selection rather than population structure, as selection 
will skew distances away from background neutral variation due to 
random drift. 

2. A regression of phenotypic distances on factor distances coupled 
with genome-wide Fst distances is carried out. If factor distances 
have an independent positive effect on the dependent variable 
(phenotypic distances), then the result is more likely to be indicative 
of polygenic selection. 


All analyses were carried out in R (v. 3.2.1). 
3. Results 


The correlation between the most recent GWAS hit (allele T of 
1s17518584) and the four SNPs metagene was r = .96 (N = 26 
populations). 

The two hits (rs17522122.G and rs10119.G) from Davies et al. 
(2015) that were found not to be in LD with any of the four replicated 
SNPs (thus excluding rs10554471, which was found to be in LD with 
rs9320913 from Rietveld et al., 2013), were correlated to the four 
SNPs metagene. The Pearson's r values were — .514 and .698 (both 
p <.05) respectively. 

One hit (rs1487441) located on chromosome six (position 
98660615) from Rietveld et al. (2014) was in LD with one hit from 
Davies et al. (2015; rs10554471) and from Rietveld et al. (2013; 
rs9320913). The other two hits were however independent signals 
(rs7923609 and rs2721173). Their correlations with the four SNPs 
metagene were .322 (ns) and — .697 (p < .05) respectively. 

An unweighted least squares factor analysis was carried out on all 
nine SNPs (one from Ibrahim-Verbaas et al., 2015, two from Davies 
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Table 1 


Structure matrix for nine cognitive ability increasing alleles, showing the loading of the nine SNP metagene on each of its constituent SNPs. 





Chromosome number Location SNP (ancestral vs derived) 
6" 98572120 1s10457441.C (A)** 

14 32372633 rs17522122.G (D) 

19 50098513 rs10119.G (D) 

1 204576983 rs11584700.G (D) 

2 100818479 rs4851266.T (D) 

1 94059554 1s236330.C (D) 

3 85604923 rs17518584.T (D) 

10 65133822 rs7923609.G (D) 

8 145744429 rs2721173.C (A) 


Factor loading Reference 
.70 Davies et al. (2015) 
—.52 Davies et al. (2015) 
.78 Davies et al. (2015) 
.82 Rietveld et al. (2013) 
.93 Rietveld et al. (2013) 
.91 Davies et al. (2011), Benyamin et al. (2013) 
.97 Ibrahim-Verbaas et al. (2015) 
35 Rietveld et al. (2014) 
—.84 Rietveld et al. (2014) 





* In linkage disequilibrium with two hits from two published GWA studies: rs9320913, location: 98,584,733 (Rietveld et al., 2013); rs1487441, location: 98,553,894 (Rietveld et al., 


2014). Another recent study (Trampush et al., 2015) replicated the effect of this locus (specifically, rs1906252) on cognitive function. 





** The alleles with a positive effect on IQ for the other SNPs linked to rs10457441 (minor allele) are derived for rs9320913 (allele A) and rs1487441 (allele A) and rs1906252 (allele A). 
This SNP — having an ancestral allele with a cognitive ability enhancing effect, seems to be an anomaly that contrasts with the other three GWAS hits in the same chromosomal region 


having derived alleles. This anomaly could be due to an error in the dbSNP database. 


et al., 2015, two from Rietveld et al., 2014, in addition to the four SNPs 
analyzed in Piffer, 2015a). 

A single factor was extracted, explaining 61% of the variance. 

Structure matrix is shown in Table 1. 

The factor scores for both the nine and four SNP metagenes, along 
with the polygenic score (the average of the allele frequencies) and 
population average IQs are shown in Table 2. 

There was a positive and significant correlation between the nine 
SNPs metagene and IQ (r = .863, N = 23). 

The method of correlated vectors was used to assess the predictive 
validity of factor analysis. The vector of the SNP's correlation with na- 
tional IQs was correlated with the vector of their factor loadings. 

There was a positive and significant correlation between the two 
vectors (r = .986). 


3.1. Randomization 
40 random SNPs matched to the nine GWAS hits were obtained 
using SNPsnap (Pers, Timshel, & Hirschhorn, 2015). These were used 


to test the hypothesis that the signal provided by the GWAS hits can 
be distinguished from background noise. That is to say, the factor 


Table 2 


extracted from the GWAS hits should be better predictive of national 
IQ than randomly matched SNPs. 

Ten sets of four and four sets of nine SNPs were factor analyzed. 
The resulting factors were regressed against country IQ and the 
four and nine SNP metagene factors respectively. The polygenic 
score was also entered into the regression as a predictor along with 
polygenic scores obtained from the randomly selected SNPs. As was 
discussed in the methods, this polygenic score was calculated as 
the average frequency of the nine cognitive-ability increasing alleles. 
Its correlation with population-level IQ was r = .91 (N = 23 
populations). The relationship is summarized in Fig. 1. Beta 
coefficients from the multiple-regression analysis are reported in 
Tables 3, 4 and 5. 

The average Beta was calculated using the absolute value for the ran- 
dom SNPs and the real number for the GWAS hits. Whilst this inflated 
the values of the random SNP Betas, it is nonetheless based on the con- 
servative assumption that the majority of the GWAS hits factor loadings 
are positive only by chance. GWAS hits produced higher Betas than the 
random SNPs (1.03 vs. .279). 

In the analyses using the four and nine GWAS hit SNP metagenes, 
these were better predictors of IQ than the random SNPs factor in all 


Factor scores computed for both the nine and four SNP metagene, along with population-average IQ, polygenic score and population/continent of origin. 








Continents“ Population Polygenic score Nine SNPs metagene scores Four SNPs metagene score Population average IQ 
AFR Afro Caribbean Barbados .3726 — 1.3174 — 1.26112 83 
AFR African Americans .3909 —1.2251 — 1.21019 85 
AFR Esan Nigerian .3607 — 1.6754 — 1.45081 71 
AFR Gambian .3451 —1.5180 — 1.44724 62 
AFR Luhya Kenyan .334 — 1.6720 — 1.5391 74 
AFR Mende Sierra Leonean 3516 — 1.4886 — 1.2412 64 
AFR Yoruban 3391 — 1.6197 — 1.4649 71 
HISP Colombian A852 1390 —.12223 83.5 
HISP Mexican (immigrants, Los Angeles) A871 3748 02157 88 
HISP Peruvian 5006 3496 — 30414 85 
HISP Puerto Rican A792 1013 00753 83.5 
E.ASN Chinese Dai 5568 1.2361 1.18278 N/A 
E.ASN Han Chinese Bejing 6182 1.2349 1.39839 105 
E.ASN Han Chinese South 6 1.1606 1.30377 105 
E.ASN Japanese .6076 9399 1.2297 105 
E.ASN Vietnamese 5914 1.2287 1.59826 99.4 
EUR American Whites (Utah) 5298 4879 .75587 99 
EUR Finnish .54 .5797 .71432 101 
EUR British .5427 5357 84863 100 
EUR Spanish 5294 3580 59903 97 
EUR Italian (Tuscany) 5229 4469 56805 99 
SAS Bengali Bangladeshi A858 1920 —,.25727 81 
SAS Gujarati Indian (immigrants, Texas) 5126 5075 47096 N/A 
SAS Indian Telegu UK 5066 2838 —.60945 N/A 
SAS Punjabi Pakistani A976 2230 18886 84 
SAS Sri Lankan (immigrants, UK) A754 1371 —.60945 79 





* AFR = Sub-Saharan African; HISP = Hispanic/Latin American; E.ASN = East Asian; Eur = European; SAS = South Asian. 
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National IQ and Polygenic score 

































HB 
O 
84 
= g 
a 
2 
> o 
5 o ope” 
4 puetm Put 
=Z a o 
= REB 
stu 
e- 
RI 
o 
o 
MSL 
T T T T T T 
0.35 0.40 0.45 0.50 0.55 0.60 


Polygenic Score GWAS Hits 




















Fig. 1. Relationship between national IQ and polygenic score. 


14 instances. For the analyses using the polygenic scores, the GWAS hits 
were a better predictor in all four instances. 

The amount of multicollinearity was calculated with the variance 
inflation factor (VIF). A rule of thumb is that if VIF > 10, then 
multicollinearity is high (Kutner, Nachtsheim, & Neter, 2004). The re- 
gressions with the factors of four SNPs and with random polygenic 
scores (Tables 3 and 5, respectively) had VIFs well below 10. However, 
the regressions with the factors of nine SNPs exhibited high 
multicollinearity with a VIF index well in excess of 10 (Table 4). 


Table 3 
Beta coefficients resulting from the regression of population IQ on the four SNPs metagene 
and the Random SNP factors. 








Random SNP Factor Four SNP metagene (GWAS hits) VIF" 

1 —.058 867 3.623 
2 130 993 1.521 
3 —.29 655 5.503 
4 —.087 866 1.540 
5 —.092 991 2.860 
6 081 985 3.303 
7 .038 946 2.382 
8 —.155 1.051 3.889 
9 —.172 .759 6.626 
10 .017 930 2.347 
Average Beta 112 9043 


MCV was run on the four random sets of SNPs. Contrary to expecta- 
tions it yielded high magnitude correlations. These are reported in 
Table 6. 

This result is likely due to the presence of phylogenetic autocorrela- 
tion. In this particular instance, the autocorrelation results from popula- 
tions that are closer in space also being genetically more similar. 

This could also account for the observation of high-magnitude corre- 
lations between the between factors extracted from the random sets of 
SNPs and population IQ, and the GWAS hits metagenes (see the correla- 
tion matrix reported in the supplementary files). 

Whilst the regression analyses indicate that the GWAS hits factors 
are better predictors of IQ than the random factors in the majority of 
cases, the latter still show a strong correlation with country-level IQ. 
The average correlation with IQ (absolute correlation coefficient) is 


Table 5 
Beta coefficients resulting from the regression of population IQ on the nine GWAS hits 
polygenic score and a random SNP polygenic score. 








Random SNPs 9 GWAS hits VIF 
Polygenic Score Polygenic Score 
1 —.049 875 2.028 
2 —.180 964 1.097 
3 A487 1.344 4.826 
4 —.169 .825 1.348 
Average Beta 221 1.002 





* Variance inflation factor: command “vif”, R package “car”. 


Table 4 
Beta coefficients resulting from the regression of population IQ on the nine SNPs metagene 
and the Random SNP factors. 


Table 6 

Results of MCV correlating the vector of IQ—allele frequency correlations with the vector of 
the loadings of a nine random matched SNP factor on its constituent alleles. Four vector 
correlations were computed in total — one for each set of nine random SNPs. 





Random SNP Factor Nine SNP metagene (GWAS hits) VIF 
1 —.372 569 2.682 
2 .014 849 28.84 
3 765 1.611 22.47 
4 869 1.695 12.02 
Average Beta 505 1.181 





First random Second random Third random Fourth random 








SNP set SNP set SNP set SNP set 
—.926" 957" —.978 —.974 
* All p < .05* 


** A positive correlation indicates that SNPs with higher factor loadings have a higher 
correlation to country IQ. 
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r = .74 (p < .05). The average correlation between the GWAS hits 
metagene and IQ is r = .89 (p < .05). 

It is thus necessary to deal with autocorrelation. This can be achieved 
by controlling for genome-wide genetic distances utilizing the proce- 
dure employed by Piffer (2015b) and described in Section 2.2. 


3.2. Controlling for population structure 


The same procedure applied by Piffer (2015b) will be extended to 
the nine GWAS hits metagene. Fst distances (Weir & Cockerham, 
1984) for Chromosome 21 and 1 (largest and smallest) will be utilized. 
These were calculated using VCFtools on the 1000 Genomes, phase 3 
files. VCFtools is a program for working with Variant Call Format (VCF) 
files, like those generated by the 1000 Genomes Project (Danecek 
et al., 2011). The R code accompanying VCFtools is reported in the 
Appendix (A1). 

In addition, distances calculated for the fourteen random SNP factors 
will be used in a simulation: they will be entered as independent vari- 
ables in a regression of IQ on Fst distances, instead of the GWAS hits 
metagene. If they fail to predict IQ above and beyond Fst distances, 
whilst the metagene does, then this will indicate that the metagene 
has predictive power independent of population structure. 

Inspection of the correlation matrix! reveals that population IQ dis- 
tances exhibit a stronger correlation with the metagene and polygenic 
score distances (r x gdist = .78; gdist_metagene = .65; Polygenic = 
.76) than to the distances representing population structure 
(‘randomnine’) (r x Fst Chromosome 1 = .58; Fst Chromosome 21 = 
.58; randomnine 1 = .52; randomnine 2 = .64; randomnine 3 = .57; 
randomnine 4 = .52). 

The R function “CIrdif’ (package “psychometric”) was used to calcu- 
late the confidence intervals for the difference between correlation co- 
efficients, comparing the correlations of each pair of variables with 
country IQ. Two of the three correlation differences are significantly 
greater than zero. The 95% Confidence Intervals for the comparisons 
are r x Fst Chromosome 1 vs r x four SNPs metagene: .068 to .32; r x 
Fst Chromosome 1 vs r x nine SNPs metagene: — .06 to .21; rx random 
nine SNPs? vs Polygenic score (nine Hits): .068 to .33. 


3.3. Regression analysis 


Regression analysis was employed to estimate the relative strength 
of each predictor. 

A total of 325 pairwise comparisons were obtained for the 26 popu- 
lations from the 1000 Genomes database utilized in the present study. 
Distances were calculated as the absolute value of the difference be- 
tween population pairs on the selected variables (country IQ, GWAS 
hit metagene and polygenic score). The country IQ variable had missing 
values, so a total of 253 distances was calculated. Fst distances were cal- 
culated for Chromosome 1, using the methodology employed by Piffer 
(2015b). Chromosome 1 and 21 Fst distances were almost identical 
(r = .992, p < .05), hence only the bigger chromosome (1) was used. 
Beta coefficients are reported in Tables 7a and 7b. 

The ratio of average Betas (Average Beta random factor/Average 
Beta Fst) is .3/.366 = .82. The ratio of Betas of GWAS hits to Fst Betas 
is 885/167 = 5.3. 

There was a moderate amount of multicollinearity but this was not 
high according to the commonly accepted threshold (Tables 7a and 7a). 


1 This is available for viewing at the following URL: https://docs.google.com/ 
spreadsheets/d/1EanD-]j15a30BJpu5Tr5e86iLTLZPpa9VUEPTwaVsHU/edit?usp=sharing. 

2 Average correlation of the four correlation coefficients for the four random polygenic 
scores comprising nine SNPs picked at random. 


Table 7a 
GWAS hits metagene distances and Fst: Standardized Beta coefficients. Dependent 
variable: IQ differences. 





Fst distances VIF 
—.093 Four SNPs metagene distances: .850 2.704 
—.033 Nine SNPs metagene distances: .685 5.155 
— 409 Polygenic score distances: 1.227 4.553 
Average Beta —0.178 0.92 


Table 7b 
Random SNPs factor distances and Fst: Standardized Beta coefficients. Dependent 
variable: IQ differences. 








Fst distances VIF 
A31 Random factor 1 distances: .317 1.295 
036 Random factor 2 distances: .609 5.033 
372 Random factor 3 distances: .227 6.454 
626 Random factor 4 distances: — 049 5.652 
Average Beta 0.366 0.276 





3.4, ANOVA 


The average frequencies of the nine SNP hits for the five 1000 Ge- 
nomes continental groups were calculated. These are represented in a 
boxplot (Fig. 2) and reported in Table 8. 

An analysis of variance (ANOVA) was conducted in order to analyze 
the difference between group means. F(1,235), Pr(>F) = .311. 

Tukey's post-hoc test was used to compare means. Confidence inter- 
vals are reported in Table 9. 


3.5. Estimating inter-population variability from average allele frequencies 


Inter-population variability is usually employed as a means of de- 
tecting signals of selection at specific loci. The Fst index is measured at 
a single locus, as it compares inter-population variability to within- 
population variability. Deviation from normality (the average 
genome-wide Fst value between two populations) suggests the pres- 
ence of selection at that locus. Another approach could be applied to 
polygenic traits based on analyzing many loci together. Once the aver- 
age allele frequency of trait-increasing alleles is calculated, it is possible 
to obtain simple measures of inter-population variability, such as the 
standard deviation (SD). The SD of the average allele frequency of the 
nine GWAS hits was .088. This was higher in magnitude than the SD es- 
timated for the average frequency of the four sets of nine random SNPs: 
.043, .032, .078 and .031. For comparison, seven sets of nine SNPs were 
also constructed from the 66 height GWAS hits reported in Piffer 
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Fig. 2. Average frequency of cognitive ability increasing alleles by continental group. 
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Table 8 
Allele frequencies by continental group. 
Populations* rs10457441 C rs17522122 G rs10119 G rs11584700 G 
AFR .195 .696 .621 .057 
AMR .352 .648 :193 .094 
ASN A17 622 904 :317 
EUR -533 511 713 229 
SAS 219 588 .786 255 


rs4851266 T rs236330C 1817518584 T 1s7923609 G rs2721173 C 
104 34 151 29 His 
.064 813 591 313 488 
554 862 86 337 487 
387 .767 .628 487 539 
247 903 495 525 12 





* AFR = Sub-Saharan African; AMR = Hispanic/Latin American; ASN = East Asian; EUR = European; SAS = South Asian. 


(2015c). The average SD was .06 (.073, .057, .0658, .0751, .0286, .0607 
and .0587). 


4. Discussion 


The average frequency (polygenic score) of nine alleles positively as- 
sociated with IQ and proxy phenotypes at the individual differences 
level in published GWAS is strongly and significantly correlated to pop- 
ulation, or country IQ (r = .91). Factor analysis of allele frequencies 
yielded a metagene factor with a similar correlation to IQ (.86). The ma- 
jority of alleles (seven out of nine) loaded positively on this factor. 40 
unrelated SNPs were drawn at random and their frequencies factor an- 
alyzed for use as a control. The pattern of very high-magnitude positive 
or negative correlations suggests that spatial autocorrelation might be 
inflating the relationships between variables. That is to say, factors ex- 
tracted utilizing random SNPs exhibited very high correlations to the 
GWAS hits factors (r = .6 to .98) and similarly high correlations with 
country IQ distances. Unexpectedly, the method of correlated vectors 
produced very high values also when run using the random SNPs, ren- 
dering the extremely high magnitude and significant correlation (.99, 
p <.05) found for the GWAS hits somewhat less impressive. However, 
the correlation of IQ with the GWAS hits metagene (.89) was somewhat 
higher than the IQ correlation with the random SNP factors (.74). When 
entered in a regression, the GWAS hits metagene and the polygenic 
score were both higher-magnitude predictors of country IQ than the 
random SNPs factor in all instances (Average Betas: 1.03 and .28, respec- 
tively). The variance inflation factor showed the presence a moderate 
amount of multicollinearity but in most instances this was not high 
(<10, see Tables 3 and 5). However, in some regressions this was very 
high (e.g. VIF > 20, see Table 4) and likely accounted for the inflated 
Betas (>1). 

Additionally, Fst distances were calculated in order to provide an es- 
timate of genetic distances between populations, reflecting neutral evo- 
lutionary processes (e.g. migration and drift) but not directional 
selection pressure. Distances between populations were calculated as 
the absolute number of the difference between pairs (325 unique 
pairs). First, the four and nine GWAS hits metagenes and the polygenic 
score emerged as better predictors of IQ than Fst distances (Beta = .68 
to 1.12 vs 0 to .41). Paradoxically, Fst distances had a negative relation- 
ship with IQ (specifically, IQ differences between populations) when 


Table 9 
Tukey's test with 95% confidence intervals for difference between continental group 
means. 








Populations Difference 95% Confidence interval 
AMR —Afr .103 —.217, 424 
ASN — AFR 237 —.083, .557 
EUR—AFR 174 —.146, 494 
SAS — AFR .140 —.180, 461 
ASN— AMR 133 —.186, 454 
EUR — AMR .070 —.249, .391 
SAS — AMR .037 —.283, .357 
EUR — ASN —.062 —.383, .257 
SAS — ASN —.097 —.417, .224 
SAS — EUR —.034 — 354, .287 





regressed with the polygenic score (i.e. differences between popula- 
tions). The latter had a very high predictive power (Beta = 1.12, see 
Table 7a). A set of regressions were run with Fst utilizing random factors 
instead of the GWAS hits metagenes as predictors. This revealed that 
random factors did not predict IQ better than Fst distances. Specifically, 
GWAS hits exhibited Beta coefficients that were about five times higher 
in magnitude than those associated with the random SNPs. Conversely, 
the latter had Betas equal to or lower than those associated with Fst dis- 
tances (ratio = .82). Although the variance inflation factor was within 
acceptable limits, it showed the presence of moderate multicollinearity 
that might explain the somewhat inflated Betas (Tables 7a and 7a). 

Comparison of allele frequency means for the five continental 
groups from the 1000 Genomes database revealed frequency differ- 
ences that closely correspond to observed continent-level aggregate 
IQ, yielding the following pattern: East Asian > European > South 
Asian > American (Hispanic) > African. However, ANOVA did not yield 
p values that meet the conventional significance threshold (p <.05), fur- 
thermore Tukey's test produced confidence intervals that bisected zero. 
The lack of statistical significance is clearly due to the very small sample 
size (N = 9). Increasing numbers of GWA studies will undoubtedly pro- 
vide more hits in the future permitting the generation of an increasingly 
accurate picture of cognitive-related genetic variation, both within and 
between populations. 

When comparing the average frequencies of height and intelligence 
GWAS hits, a striking difference is the much higher inter-population 
variability of the latter, ranging from 33.4% (Luhya, Kenyan) to 61.82% 
(Han Chinese, Bejing). Conversely, the average frequencies of height in- 
creasing alleles range from 44.76% (Vietnamese) to 49.5% (Esan 
Nigerian) (Piffer, 2015c). This might indicate a stronger signal of selec- 
tion for intelligence than for height, thus creating larger variability 
across isolated populations. 

Another possibility is that environmental or social (e.g. cultural, sex- 
ual) factors that affect intelligence-related fitness differentials vary 
more across geographic regions and human populations compared to 
factors affecting height-related fitness differential. Relevant to this is 
the distinction between directional and diversifying selection 
(Holsinger & Weir, 2009). For example, if there has been geographically 
homogeneous sexual selection for taller males (due to intrasexual or in- 
tersexual competition) this would have produced an increase in the fre- 
quency of alleles with a positive effect on height, but the frequency 
differentials would have remained relatively small. Thus, this would re- 
sult in strong directional selection with relatively little diversifying se- 
lection. On the other hand, if factors affecting intelligence-related 
fitness differentials, such as cultural differences in economic conditions 
and reproductive habits such as marriage customs and use of contracep- 
tion, have varied more across populations, this would have resulted in 
high diversifying selection even in the presence of relatively weak 
directional selection. Evolutionary mechanisms such as gene-culture 
co-evolution (e.g. Feldman & Laland, 1996) are good candidates for 
the explanation of the apparent high diversifying selection on intelli- 
gence (Woodley, 2011). For example, higher “genotypic intelligence” 
could cause the development of more complex societies, and life in 
these complex societies might increasingly select for high-IQ genes. 
Culture-gene co-evolutionary mechanisms of this sort have been pro- 
posed by Clark (2006) to account for the apparent rise in human capital 
salient traits among among pre-industrial populations in England. 
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Similar mechanisms have been proposed to account for apparent dys- 
genic trends with respect to g among Western countries in the period 
1800 to the present (Woodley & Figueredo, 2013). 

The higher number of SNPs comprising the height polygenic score 
(N = 66) also likely account to some extent for the smaller differences, 
however subsets of nine height increasing SNPs had a lower average SD 
relative to the cognitive ability GWAS hits (.06 vs .088). Moreover, the 
polygenic scores comprising nine random SNPs exhibit smaller average 
inter-population variability (SD = .046 vs .088). 

Encouraging results for the study of the evolution of height and in- 
telligence come from a recent paper presenting evidence of directional 
dominance on height and intelligence (but not on other traits such as 
blood pressure and low density lipoprotein cholesterol). This study pro- 
vides evidence that increased stature and cognitive capacity have been 
recently positively selected in human evolution (Joshi et al., 2015) and 
provide additional context for the results of the present paper. 

It should be noted that all of the nine alleles are present at significant 
frequencies (>5%) among all the five major races (Sub-Saharan African, 
South Asian, European, East Asian, American) (see Table 8). Thus, the in- 
telligence polymorphisms do not appear to be race-specific but were al- 
ready present in Homo sapiens prior to the African exodus circa 60-100 
Kya. This is even more remarkable, given that the GWAS samples 
consisted mostly of individuals of European descent and that none of 
the GWAS hits appears to be European-specific polymorphisms 
(Table 8). Itis thus likely that the vast majority of mutations affecting in- 
telligence were already present in the ancestral African population and 
as humans settled in different parts of the worlds, these polymorphisms 
were subject to directional selection pressure, which produced an over- 
all increase in human intelligence at different rates in different geo- 
graphical areas. For the same reason, if non-European intelligence 
increasing polymorphisms exist, these are likely to represent a minority 
of the additive genetic variation contributing to differences in 
intelligence. 

Population structure due to migrations and genetic drift constitute 
noise that this study attempted to isolate from the selection signals. 
However, in addition there may be pleiotropic effects on traits other 
than IQ so that an “IQ gene” could be subject to other kinds of selection 
pressure (e.g. via selection on blood pressure, body mass index, etc.) 
This complicates the picture and makes it more difficult to isolate a selec- 
tion signal from genetic variants each with a small effect on the pheno- 
type. Analyzing greater numbers of genes is thus very important, as the 
confounding effects of pleiotropic genes will cancel each other out over 
a large number of alleles as selection pressures will operate in different 
directions for different phenotypes. However, pleiotropy can “deviate” 
the frequency of a single allele. Reliance on a relatively limited number 
of populations (N = 26) also limits the significance of the results and 
does not enable us to completely rule out the null hypothesis that migra- 
tions or genetic drift account for the reported country-level effects. 

Molecular population genetics is a rapidly expanding field and in the 
near future more cognitive ability-related genes will undoubtedly be 
identified. These will permit the hypothesis presented in this paper to 
be more comprehensively tested. Also the rapid progress in the se- 
quencing of ancient genomes from fossils and also human remains 
from historical periods will enable us to calculate their “intelligence 
polygenic score” and to examine historical change in intelligence. We 
will be able employ these data in addressing paradoxes. For example, 
why did the apparently large brained Neanderthal lack the ecological 
flexibility necessary to outcompete H.sapiens? Alternatively, on a small- 
er time-scale, recent dysgenic trends in g in industrial societies can be 
quantified at the molecular level, corroborating possible phenotypic ev- 
idences of this (Woodley of Menie, 2015; Woodley of Menie, Fernandes, 
Figueredo, & Meisenberg, 2015). 

Finally, this method can be “reverse-engineered” to aid in the detec- 
tion of new GWAS hits by selecting polymorphisms whose frequencies 
correlate with the polygenic score or selection factor. These genes (or 
“polygenes”) will have a higher probability of being intelligence- 


related genes, thus reducing the need for extremely large samples and 
the reliance upon ‘chance capitalization’ typical of current intelligence 
GWA studies (Piffer & Gilfoyle, 2014). 
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Appendix A1 


Vcftools code:cd c:/folder/... #set to directory containing 1000 
Genomes vcf file 

c:/Users/.../vcftools/bin/vcftools --vcfALL.chr1.phase3_shapeit2_ 
mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop 
POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2 

R code and datasets can be downloaded from: osf.io/jt73x 


Appendix B. Supplementary data 


Supplementary data to this article can be found online at http://dx. 
doi.org/10.1016/j.intell.2015.08.008. 
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